Morpheme Conversion for Connecting Speech Recognizer and Language Analyzers in Unsegmented Languages
نویسندگان
چکیده
Connecting automatic speech recognizers (ASRs) and language analyzers is difficult since they may be based on differences in part-of-speech (POS) systems; the latter cannot directly analyze the outputs of the former. In addition, in unsegmented languages such as Japanese, the ASR outputs are likely to have different word segmentation from that of the language analyzer inputs because they are individually developed. A conventional approach is to generate raw texts from the ASR outputs and re-analyze them using a morphological analyzer. However, if the ASR outputs contain recognition errors, the morphological analyzer incorrectly analyzes them even though they contain correctly recognized words. To avoid this problem, we propose a morpheme conversion method that directly converts ASR outputs into morpheme sequences suitable for the language analyzers. Our experiments show that morpheme conversion is more robust than the conventional approach against recognition errors.
منابع مشابه
Modeling Morphosyntax with Finite-state Transducers and Its Application to Hungarian Lvcsr
Large vocabulary speech recognition systems for several languages have to use morphemes as the basic recognition units. Such systems are frequently suffering from the over-generation property of the smoothed N -gram language model. The source of the problem is that most of the function-morphemes are very short and their unigram likelihood is high. These morphemes are inserted frequently in the ...
متن کاملUtilizing prosody for unconstrained morpheme recognition
Speech recognition systems for languages with a rich in ectional morphology (like German) su er from the limitations of a word{based full{form lexicon. Although the morphological and acoustical knowledge about words is coded implicitly within the lexicon entries (which are usually closely related to the orthography of the language at hand) this knowledge is usually not explicitly available for ...
متن کاملLanguage of General Fuzzy Recognizer
In this note first by considering the notion of general fuzzy automata (for simplicity GFA), we define the notions of direct product, restricted direct product and join of two GFA. Also, we introduce some operations on (Fuzzy) sets and then prove some related theorems. Finally we construct the general fuzzy recognizers and recognizable sets and give the notion of (trim) reversal of a given GFA....
متن کاملMorpheme-Based Language Modeling for Amharic Speech Recognition
This paper presents the application of morpheme-based and factored language models in an Amharic speech recognition task. Since using morphemes in both acoustic and language models results, mostly, in performance degradation due to acoustic confusability and since it is problematic to use factored language models in standard word decoders, we applied the models in a lattice rescoring framework....
متن کاملMorfessor and variKN machine learning tools for speech and language technology
This paper introduces two recent open source software packages developed for unsupervised natural language modeling. The Morfessor program segments words automatically into morpheme-like units without any rule-based morphological analyzers. The VariKN toolkit trains language models producing a compact set of high-order n-grams utilizing state-of-art KneserNey smoothing. As an example, this pape...
متن کامل